Diagnostic implications of ubiquitination-related gene signatures in Alzheimer's disease

The purpose of this study was to explore the diagnostic implications of ubiquitination-related gene signatures in Alzheimer's disease. In this study, we first collected 161 samples from the GEO database (including 87 in the AD group and 74 in the normal group). Subsequently, through differential expression analysis and the iUUCD 2.0 database, we obtained 3450 Differentially Expressed Genes (DEGs) and 806 Ubiquitin-related genes (UbRGs). After taking the intersection, we obtained 128 UbR-DEGs. Secondly, by conducting GO and KEGG enrichment analysis on these 128 UbR-DEGs, we identified the main molecular functions and biological pathways related to AD. Furthermore, through the utilization of GSEA analysis, we have gained insight into the enrichment of functions and pathways within both the AD and normal groups. Further, using lasso regression analysis and cross-validation techniques, we identified 22 characteristic genes associated with AD. Subsequently, we constructed a logistic regression model and optimized it, resulting in the identification of 6 RUbR-DEGs: KLHL21, WDR82, DTX3L, UBTD2, CISH, and ATXN3L. In addition, the ROC result showed that the diagnostic model we built has excellent accuracy and reliability in identifying AD patients. Finally, we constructed a lncRNA-miRNA-mRNA (competing endogenous RNA, ceRNA) regulatory network for AD based on six RUbR-DEGs, further elucidating the interaction between UbRGs and lncRNA, miRNA. In conclusion, our findings will contribute to further understanding of the molecular pathogenesis of AD and provide a new perspective for AD risk prediction, early diagnosis and targeted therapy in the population.

Many neurodegenerative diseases like AD, are a collection of harmful and associated-prone proteins.These associated proteins are found to be ubiquitinated in many neurodegenerative diseases.Even though hazardous proteins are instantly deteriorated by proteolytic systems in healthy individuals, any systems perturbation caused by genetic differences, lifestyle or aging leads to a collection of hazardous protein composites and the outbreak of different diseases like neurodegenerative diseases.The significant targeting signal for proteolytic systems is a Ubiquitination.Ubiquitin-conjugating enzyme E2I (Ubc9) ligates small ubiquitin-related modifier (SUMO) to target proteins, resulting in changes of their localization, activity, or stability.Sumoylation of amyloid precursor protein (APP) was reported to be associated with decreased levels of beta amyloid (Abeta) aggregates, suggesting that sumoylation may play a role in the pathogenesis of AD.The association between genetic variations of Ubc9 gene (UBE2I) and late-onset Alzheimer's disease.Sporadic Alzheimer's disease (SAD) is the leading neurodegenerative disease.With the evolution of next-generation DNA sequencing technology, various inherent risk factors have been illustrated.Studies have shown that more single nucleotide variants (SNVs) have been found in genes that exist on the X chromosome from SAD patients.These variations were validated using the strictest method through the Chain termination method.In loci associated ubiquitin pathway (ATXN3L and UBE2NL) two of the inherent variants were established and have not already been elucidated as SAD inherent risk factors.However, the pathogenesis of AD is not fully understood 11 .Therefore, we speculated that ubiquitination-related genes (UbRGs) have certain diagnostic significance for AD and are involved in controlling the incidence and growth of AD.In this study, we validated our hypothesis through comprehensive bioinformatics analysis to further understand the molecular mechanism of AD.

Sample data acquisition and collation
The Gene Expression Omnibus (GEO) database, established and managed by the National Center for Biotechnology Information (NCBI), encompasses a diverse range of gene expression data, including second-generation sequencing data, chip sequencing data, single-cell sequencing data, and more 12 .We selected and downloaded dataset GSE5281 from the GEO database, which included 87 AD samples and 74 normal samples.DEGs are processed using the "R" language "limma" package and calculate adjusted P values and | logFC|.For GSE5281 gene expression profiles, P values < 0.05, and |log (FC)| > 1were selected as the cutoff values 13 .

Differential gene expression analysis between AD and normal groups
First, we input the transcriptome expression matrix and the grouping list of samples into R software.If a gene had multiple rows of expression values, its average value was taken, and the expression value was log2-transformed.Next, we cited R package limma to calculate the gene expression difference between the AD group and normal group through the function Wilcox and obtained the differentially expressed genes (DEGs) 14 .Finally, the results of differential expression analysis were output and visualized via R packages pheatmap and ggplot2.The screening conditions were |log2 (fold change, FC)| > 0.585 and adjusted P-value < 0.05 15 .

Screening of ubiquitination-related DEGs (UbR-DEGs)
Through the above differential expression analysis, we obtained DEGs between groups.Then we read the result file of the difference analysis and the list of ubiquitination-related genes (UbRGs) from the database iUUCD 2.0 (http:// iuucd.biocu ckoo.org/) 16 and took the intersection between DEGs and UbRGs to obtain UbR-DEGs.Furthermore, we utilized the R package VennDiagram to generate a Venn diagram to visualize the UbR-DEGs 17 .Additionally, we employed the R packages limma and pheatmap to visualize the expression patterns of these UbR-DEGs 15 .Fisher's exact test is used to analyze the common genes between DEGs and UbRGs.To exclude non accidental occurrences, paired t-tests were used.When P < 0.05, it is considered statistically significant.

Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses of UbR-DEGs
The initial step involved the conversion of UbR-DEGs names to R-recognized gene IDs using the R package org.Hs.eg.db 18 .Subsequently, we conducted GO enrichment analysis to explore UbR-DEGs and AD-related genes, including 27 biological processes (BP), cellular components (CC), and molecular functions (MF).This analysis was performed using the R packages cluster Profiler and enrichplot.Following this, the enrich KEGG function was employed to identify the key pathways associated with the molecular mechanisms of AD by enriching the pathways of UbR-DEGs.The outcomes were then visualized using the R packages enrichplot and ggplot2 19,20 .A significance level of P-value < 0.05 was adopted for statistical significance.

Gene set enrichment analysis (GSEA) between AD and normal groups
GSEA is used to evaluate the distribution trend of genes from a predefined gene set in the gene list ranked with phenotypic relevance, to judge its contribution to phenotype 21 .We used the R packages limma, org.Hs.eg.db, cluster Profiler, and enrichplot to first read and collate the expression parameter information of all transcriptome genes and the customized gene set file.We then performed GSEA enrichment analyses (including functional and pathway enrichment analyses for AD and normal groups).Finally, we visualized the results by enrichment plots 22,23 .A P-value less than 0.05 was considered as significant.

Identification of hub genes
First, we read the list file of UbR-DEGs, extracted the expression levels of UbR-DEGs according to the transcriptome gene expression matrix, and obtained the grouping information of samples.The "glmnet" package from R software (version 3.6.3)was used to perform the LASSO Cox regression model analysis.The "lambda.1se" value, which was determined by tenfold cross-validation, was set as the lambda for model fitting 24 .

Logistic regression model
We divided the GSE5281 dataset into a training set and an independent testing set to construct a logistic regression model.The samples were classified into 2 types, the control group and the AD sample group.DEG expression was included as continuous predictive variables, and the sample type was regarded as categorical responsive values to establish the logistic regression model by using glm package in R software.The screened variables by the stepwise regression method were adopted to reconstruct the model, and the p value of each predictive variable was calculated.The variables with p value ≤ 0.05 were chosen for model reconstruction and subsequent analysis 25,26 .Receiver operating characteristic (ROC) curves for 1, 3, and 5 years were plotted using area under the curve (AUC) calculated.Finally, the validation set was utilized to assess predictive accuracy and performance of UbR-DEGs the model.

Construction of lncRNA-miRNA-mRNA (ceRNA) regulatory network based on RUbR-DEGs
To determine the interaction between lncRNAs and RUbR-DEGs (mRNAs), we combined lncRNAs and mRNAs data with miRNAs data respectively to construct lncRNA-miRNA-mRNA (ceRNA) regulatory networks.First, we used TargetScan, miRanda and miRDB databases to predict the target genes (RUbR-DEGs) of miRNAs.If RUbR-DEGs were considered to be the target gene of a specified miRNA in the three databases at the same time, we would retain the RUbR-DEGs for subsequent analysis 27,28 .Then, we used the Perl script to determine the interaction between miRNA and lncRNA through the database spongeScan and obtained the network relationship file and node attribute file 29 .Finally, we used the software Cytoscape to visualize the interaction between lncRNA-miRNA-mRNA and construct the ceRNA regulatory network 30 .

GSE5281 dataset clinical characteristics
Transcriptomics data from previously published literature curated in Gene Expression Omnibus (GEO) database was searched for Alzheimer's disease and only those GEO datasets were selected where the differential gene expression analysis could be done by the GEO2R tool for gene expression data.We have analysed the gene expression datasets with GSE IDs-GSE5281.Represent the pooled data sets from the brain regions: Entorhinal cortex, hippocampus, medial temporal gyrus, posterior cingulate, superior frontal gyrus, and Primary visual cortex.Detailed information of age, number, and gender of subjects is provided in Table 1.

Identification of DEGs between AD and normal groups
By conducting differential expression analysis, a total of 3450 DEGs were identified between the AD and normal groups.Among these DEGs, 1733 were up-regulated and 1717 were down-regulated, as depicted in Fig. 1a.Furthermore, Fig. 1b displayed

Identification of UbR-DEGs in AD
A total of 3450 DEGs were identified through differential expression analysis, while 806 UbRGs were obtained from the iUUCD 2.0 database.Subsequently, 128 genes were identified as the overlapped genes, referred to as UbR-DEGs (Fig. 2a).The number of UbR-DEGs upregulated is 73, and the number of downregulated is 55.The differential expression of these UbR-DEGs between the AD and normal groups is depicted in Fig. 2b.

GO and KEGG enrichment of 128 UbR-DEGs in AD
Through the examination of GO and KEGG enrichment of the above 128 UbR-DEGs, we have identified the principal molecular functions and biological pathways of ubiquitination-related signatures to associate with AD.
As depicted in Fig. 3a, the GO enrichment analysis of these UbR-DEGs revealed that the functions of BP primarily encompass proteasome-mediated ubiquitin-dependent protein catabolic process, protein polyubiquitination, protein modification by small protein removal, protein deubiquitination, and protein autoubiquitination, among others.The functions of CC were found to be closely linked to the ubiquitin ligase complex, cullin-RING ubiquitin ligase complex, spindle, SCF ubiquitin ligase complex, and sarcomere, among others.The functions of MF were predominantly enriched in ubiquitin-like protein or ubiquitin-protein transferase activity, ligase activity, and ligase binding, among others.In the KEGG enrichment analysis, the pathways identified by UbR-DEGs were primarily focused on Ubiquitin mediated proteolysis, Cell cycle, NOD-like receptor signaling pathway, Epstein-Barr virus infection, Shigellosis, Huntington disease, Notch signaling pathway, Viral life cycle-HIV-1, TGF-beta signaling pathway, and JAK-STAT signaling pathway, among others (Fig. 3b).

Function and pathway enrichment in AD and normal groups
Through the utilization of GSEA analysis, we have gained insight into the enrichment of functions and pathways within both the AD and normal groups.Figure 4a illustrates the detection of significant enrichment in GO functions, specifically Cell-cell junction organization, Epithelial cell development, Positive regulation of vasculature development, Regulation of epithelial cell differentiation, and Regulation of vasculature development, within the AD group.Conversely, the normal group exhibited significant enrichment in Atp synthesis coupled electron transport, Mitochondrial translation, Inner mitochondrial membrane protein complex, Mitochondrial proteincontaining complex, and Organellar ribosome, as depicted in Fig. 4b.Furthermore, Fig. 4c demonstrates that within the AD group, the KEGG pathway was significantly enriched in Cytokine cytokine receptor interaction, Ecm receptor interaction, Focal adhesion, Notch signaling pathway, and Pathways in cancer.On the other hand, the normal group exhibited significant enrichment in Alzheimer's disease, Huntingtons disease, Oxidative phosphorylation, Parkinsons disease, and Proteasome, as shown in Fig. 4d.

Construction and validation of diagnostic models for AD based on six RUbR-DEGs
The least absolute shrinkage and selection operator (LASSO) regression was used to avoid overfitting and screened 6 RUbR-DEGs (KLHL21, WDR82, DTX3L, UBTD2, CISH, and ATXN3L) (Fig. 5a,b).The differential expression of these RUbR-DEGs across various clinical phenotypes, such as age, gender, and type, was illustrated in Fig. 5c.Furthermore, we calculated the risk score for each sample using the risk calculation formula provided in the supplementary document (S1_Risk score.xls).In addition, the ROC curve demonstrated that all six RubR-DEGs exhibited an area under the ROC curve (AUC) exceeding 0.650.and the AUC value of the diagnostic model's parameter risk score equalled 1.000 (Table 2, Fig. 5d), suggesting that the constructed diagnostic model possessed exceptional precision and dependability in discerning patients with AD.

Discussion
AD is the most common type of elderly dementia, which can impair patients' thinking, memory, and independence, affecting their quality of life, and even leading to death 31,32 .The exact cause of AD has not yet been elucidated.Some studies have found that the typical histopathological changes of AD are amyloid protein deposition and neuronal fiber tangles in the brain 33 , and various theories are attempting to explain this change, including www.nature.com/scientificreports/β-Amyloid protein waterfall theory, tau protein theory, neurovascular hypothesis, etc [34][35][36] .In the end, the nerve cells in the patient's brain "silently" atrophy or even die, or there are abnormalities in signal transmission between cells, leading to cognitive impairments such as memory, language, computation, and behavior 37 .Research has found that AD is the result of a combination of genes, lifestyle, and environmental factors, partially caused by specific genetic changes 4 .Ubiquitination of proteins is involved in the development of many diseases, including neurodegenerative diseases, such as AD 38,39 .Therefore, this study attempts to use comprehensive bioinformatics methods to elucidate the role of UbRGs in AD and their specific molecular functions and construct a risk diagnosis model for AD based on RUbR-DEGs.
In this study, we first collected 161 samples from the GEO database (including 87 in the AD group and 74 in the normal group).Subsequently, through differential expression analysis and the iUUCD 2.0 database, we obtained 3450 DEGs and 806 UbRGs.After taking the intersection, we obtained 128 UbR-DEGs.This indicated that these UbR DEGs play a key role in the occurrence and development of AD.
Secondly, by conducting GO and KEGG enrichment analysis on these 128 UbR-DEGs, we identified the main molecular functions and biological pathways related to AD.The GO enrichment analysis revealed that the functions are found to be closely linked to protein polyubiquitination, modification, deubiquitination and autoubiquitination, the ubiquitin ligase complex, spindle, sarcomere, ubiquitin-like protein or ubiquitin-protein transferase activity, ligase activity, and ligase binding, among others.In the KEGG enrichment analysis, the pathways identified by UbR-DEGs were primarily focused on Ubiquitin mediated proteolysis, Cell cycle, NODlike receptor signaling pathway, Notch signaling pathway, TGF-beta signaling pathway, and JAK-STAT signaling pathway, among others.Furthermore, through the utilization of GSEA analysis, we have gained insight into the enrichment of functions and pathways within both the AD and normal groups.Among them, GO functions were mainly enriched in Cell-cell junction organization, Epithelial cell development, Positive regulation of vasculature development, Regulation of epithelial cell differentiation, and Regulation of vasculature development, within the AD group.And KEGG pathways were significantly enriched in Cytokine cytokine receptor interaction, Ecm receptor interaction, Focal adhesion, Notch signaling pathway, and Pathways in cancer.
Further, using lasso regression analysis and cross-validation techniques, we identified 22 characteristic genes associated with AD.Subsequently, we constructed a logistic regression model and optimized it, resulting in the identification of 6 RUbR-DEGs: KLHL21, WDR82, DTX3L, UBTD2, CISH, and ATXN3L.A Gómez Ramos et al. 40 sequenced the Exon group of brain samples from sporadic AD (SAD) patients and normal controls.They found more single nucleotide variations (SNV) in the genes on the X chromosome of SAD patients, and verified these variants through Sanger sequencing.Two new gene variants were found in the loci related to the ubiquitination (UBE2NL and ATXN3L).UBE2NL is another protein related to the ubiquitination-de-ubiquitination process.It is also expressed in brain and it participates in parkin-dependent mitophagy, a process that can be dysregulated in AD.ATXN3L is a deubiquitinating enzyme expressed in brain and associated with Machado-Joseph disease.ATXN3L and UBE2NL were among the genes in the X chromosome showing SNVs present in all the DNA samples from SAD patients.These two genes are expressed in the brain and they have functions that could be related to SAD pathology.We compared the present data with previous loci detected in GWAS studies.,We found that a SNV at ATXN3L, locus chrx: 13337059, has been already reported (http:// www.gwasc entral.org/).It has been found that ubiquitin carboxyl-terminal Hydrolase L1 (UCH-L1) is a kind of deubiquitinating enzyme, which plays a regulatory role in proteins that target Proteasome degradation 41 .UCH-L1 is highly expressed in neurons and has been shown to promote cell viability and maintain neuronal integrity.The expression of UCH-L1 can salvage synaptic dysfunction and memory deficits in AD model mice 42,43 .Wang et al. 44 found that the inflammatory stimulator lipopolysaccharide and TNF-α activate NF-κB signaling by inhibiting its transcription leading to decreased UCH-L1 gene expression, suggesting that inflammation may impair the normal function of neurons through the interaction of NF-κB and UCH-L1.The loss of UCH-L1 activity coupled with the gain of proteinopathy function are linked to neurodegeneration such as Parkinsonism and Alzheimer's disease.In addition, the ROC result showed that the diagnostic model we built has excellent accuracy and reliability in identifying AD patients.
Finally, we constructed a lncRNA-miRNA-mRNA (ceRNA) regulatory network for AD based on six RUbR-DEGs, further elucidating the interaction between UbRGs and lncRNA, miRNA.This has important reference significance for us to deeply understand the molecular pathogenic mechanism of AD and develop potential drug targets.
However, this study also has certain limitations that need to be addressed in future research.For example, due to the limitations of database samples, it is mainly based on ubiquitination-related genes, lacking additional information on other genes involved in AD, molecular pathways, and how they are more generally related to ubiquitin, and lacking additional integrated datasets for extensive gene ontology analysis.In addition, there is a lack of subgroup analysis of AD pathological subtypes and more detailed clinical parameters to improve the construction of risk diagnosis models.At the same time, more samples need to be collected to confirm our findings through wet experiments.

Figure 1 .
Figure 1.The volcano plot (a) and heatmap (b) of DEGs expressed between the AD and normal groups.Red dots or squares indicate upregulated DEGs; Green dots or blue squares indicate downregulated DEGs.

Figure 2 .
Figure 2. Identification of UbR-DEGs in AD.(a) The Venn diagram of overlapped genes (UbR-DEGs).(b) The heatmap of UbR-DEGs expressed between the AD and normal groups.Red squares indicate upregulated genes and blue squares indicate downregulated genes.

Figure 3 .
Figure 3. Bubble plots for GO (a) and KEGG (b) enrichment analyses.The horizontal axis represents the number of enriched genes, and the vertical axis displays the name of pathways.

Figure 4 .
Figure 4. Plots of GSEA analysis.GO functional enrichment of AD (a) and normal group (b).KEGG pathway enrichment of AD (c) and normal group (d).

Figure 5 .
Figure 5. Screening of six RUbR-DEGs in AD.The deviance profile (a) and coefficient profile (b) of RUbR-DEGs screened by LASSO regression.(c) A heatmap of six RUbR-DEGs expressions between clinical phenotypes.Blue squares indicate deregulated genes; red squares indicate upregulated genes.(d) ROC curves of six RUbR-DEGs and Risk score.AUC is the area under the curve.Abscissa, 1-specificity (false positive rate).Ordinate, sensitivity (true positive rate).

Figure 6 .
Figure 6.lncRNA-miRNA-mRNA (ceRNA) regulatory network.The red rectangular node represents mRNAs (RUbR-DEGs), the blue diamond node represents lncRNAs, the green oval node represents miRNAs, and the lines between them represent interaction pairs.